feat(source/cloud-storage): add Cloud Storage source with list_objects and read_object tools#3081
Open
huangjiahua wants to merge 6 commits intogoogleapis:mainfrom
Open
feat(source/cloud-storage): add Cloud Storage source with list_objects and read_object tools#3081huangjiahua wants to merge 6 commits intogoogleapis:mainfrom
huangjiahua wants to merge 6 commits intogoogleapis:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request adds Google Cloud Storage integration, introducing a new source and tools for listing and reading objects. The implementation includes configuration, error handling, and tests. Feedback recommends capping listing page sizes at 1000 for consistency, implementing memory safety limits when reading objects, and updating documentation titles to include the 'Tool' suffix.
…s and read_object tools Adds a new project-scoped `cloud-storage` source using ADC, plus two read-only tools: `cloud-storage-list-objects` (with prefix/delimiter/pagination) and `cloud-storage-read-object` (with HTTP-style byte range and base64 payload). Introduces a GCS-aware error classifier in `cloudstoragecommon` that splits failures into Agent errors (missing bucket/object, bad request, unsatisfiable range) and Server errors (auth, IAM denial, quota, 5xx, cancellation) per DEVELOPER.md, replacing the coarse-grained `util.ProcessGcpError`. Ships YAML-parse unit tests, an error-classifier unit test, a range-parser unit test, a live-GCS integration test (12 sub-tests, UUID-suffixed bucket with self-cleanup), docs under `docs/en/integrations/cloud-storage/`, and a `cloud-storage` CI shard. The remaining 12 tools from the approved design doc land in follow-up PRs.
…dObject at 1 MiB - ListObjects: pageSize() now clamps to the GCS API max of 1000 so callers that pass a larger max_results don't pre-allocate oversized buffers. - ReadObject: reject objects/ranges over 1 MiB with the new sentinel cloudstoragecommon.ErrReadSizeLimitExceeded, which the classifier maps to an Agent error so the LLM can retry with a narrower 'range'. - Docs + integration tests updated (two new sub-tests: oversize rejection and oversize-narrowed-by-range success).
… MiB 8 MiB gives agents more headroom for typical text/JSON/log payloads while still guarding against OOM. Doc and the oversize integration seed updated to match.
…ckage DefaultMaxReadBytes doesn't belong in errors.go — the limit is a source-side invariant, not an error-classification concern. The sentinel ErrReadSizeLimitExceeded stays in cloudstoragecommon because the classifier still needs to recognize it.
…geSize bounds Cleanup loop in the integration test was treating any iterator error as iterator.Done; now distinguishes the two and logs non-Done errors so flaky teardowns are debuggable. Also adds an internal unit test for pageSize covering 0, negative, in-range, and over-cap inputs.
MCP tool results only carry text today, so the previous base64-encoded content was unusable by the LLM. Validate object bytes with utf8.Valid and return plain-text content; non-UTF-8 objects surface as an agent-fixable ErrBinaryContent error. TODO notes mark the spots to revisit once MCP supports embedded resources.
91a222a to
4919821
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds Google Cloud Storage as a first-class source in MCP Toolbox, enabling LLM agents to work with objects across buckets in a GCP project. The source is project-scoped and authenticates via Application Default Credentials, mirroring Firestore/Bigtable.
This first PR ships the source plus two read-only tools from the approved design (14 total):
cloud-storage-list-objects— prefix filter, delimiter-based grouping (returnsprefixes), and pagination viamax_results/page_token. Passes through whatever metadata the GCS client returns (*storage.ObjectAttrs) so we don't have to plumb new fields later.cloud-storage-read-object— reads an object's bytes, textual data only, with optional HTTP-style byte ranges (bytes=0-999,bytes=-500,bytes=500-).GCS-aware error categorization (per DEVELOPER.md) is implemented in a new
cloudstoragecommonhelper that maps GCS sentinels and*googleapi.Errorcodes to Agent errors (missing bucket/object, bad request, unsatisfiable range) vs. Server errors (auth, IAM denial, quota, 5xx, context cancellation). This replaces the coarseutil.ProcessGcpErrorfor the two new tools.Remaining 12 tools from the design doc (
list_buckets,create_bucket,copy/move/delete_object, etc.) will land in follow-up PRs.CI note: the
cloud-storageshard in.ci/integration.cloudbuild.yamlexpectsCLOUD_STORAGE_PROJECT=$PROJECT_IDand requires the test service account to have a Cloud Storage admin role in the test project. Integration test self-manages its own UUID-suffixed bucket with defer-based cleanup.PR Checklist
!if this involve a breaking changeWhat's included
internal/sources/cloudstorage/(+ YAML-parse unit tests)internal/tools/cloudstorage/cloudstoragelistobjects/,.../cloudstoragereadobject/(+ YAML-parse + range-parser unit tests)cloudstoragecommonerror classifier (+ 17-case unit test covering sentinels, HTTP statuses,context.Canceled/DeadlineExceeded, and fallback)tests/cloudstorage/cloud_storage_integration_test.go— 12 sub-tests against a real bucket (self-created, self-cleaned)docs/en/integrations/cloud-storage/(source + both tool pages; passes.ci/lint-docs-{source,tool}-page.sh)cloud-storagein.ci/integration.cloudbuild.yamlcloud.google.com/go/storage v1.62.1Opening as draft for initial review — happy to split the error-classifier refactor into a separate commit if reviewers prefer.